N-gram Parsing for Jointly Training a Discriminative Constituency Parser

نویسندگان

  • Arda Çelebi
  • Arzucan Özgür
چکیده

Syntactic parsers are designed to detect the complete syntactic structure of grammatically correct sentences. In this paper, we introduce the concept of n -gram parsing, which corresponds to generating the constituency parse tree of n consecutive words in a sentence. We create a stand-alone n -gram parser derived from a baseline full discriminative constituency parser and analyze the characteristics of the generated n -gram trees for various values of n . Since the produced n -gram trees are in general smaller and less complex compared to full parse trees, it is likely that n -gram parsers are more robust compared to full parsers. Therefore, we use n -gram parsing to boost the accuracy of a full discriminative constituency parser in a hierarchical joint learning setup. Our results show that the full parser jointly trained with an n -gram parser performs statistically significantly better than our baseline full parser on the English Penn Treebank test corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

N - gram Parsing for Jointly Training a Discriminative Constituency

Syntactic parsers are designed to detect the complete syntactic structure of grammatically correct sentences. In this paper, we introduce the concept of n-gram parsing, which corresponds to generating the constituency parse tree of n consecutive words in a sentence. We create a stand-alone n-gram parser derived from a baseline full discriminative constituency parser and analyze the characterist...

متن کامل

Self-training a Constituency Parser using n-gram Trees

In this study, we tackle the problem of self-training a feature-rich discriminative constituency parser. We approach the self-training problem with the assumption that while the full sentence parse tree produced by a parser may contain errors, some portions of it are more likely to be correct. We hypothesize that instead of feeding the parser the guessed full sentence parse trees of its own, we...

متن کامل

Tree Kernels-based Discriminative Reranker for Italian Constituency Parsers

English. This paper aims at filling the gap between the accuracy of Italian and English constituency parsing: firstly, we adapt the Bllip parser, i.e., the most accurate constituency parser for English, also known as Charniak parser, for Italian and trained it on the Turin University Treebank (TUT). Secondly, we design a parse reranker based on Support Vector Machines using tree kernels, where ...

متن کامل

A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing

Most previous studies of morphological disambiguation and dependency parsing have been pursued independently. Morphological taggers operate on n-grams and do not take into account syntactic relations; parsers use the “pipeline” approach, assuming that morphological information has been separately obtained. However, in morphologically-rich languages, there is often considerable interaction betwe...

متن کامل

Word Segmentation, Unknown-word Resolution, and Morphological Agreement in a Hebrew Parsing System

We present a constituency parsing system for Modern Hebrew. The system is based on the PCFG-LA parsing method of Petrov et al. (2006), which is extended in various ways in order to accommodate the specificities of Hebrew as a morphologically rich language with a small treebank. We show that parsing performance can be enhanced by utilizing a language resource external to the treebank, specifical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Polibits

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2013